An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data
نویسندگان
چکیده
Outlier detection is an important task in the field of data mining and a highly active area research machine learning. In industrial automation, datasets are often high-dimensional, meaning effort to study all dimensions directly leads sparsity, thus causing outliers be masked by noise effects high-dimensional spaces. The “curse dimensionality” phenomenon renders many conventional outlier methods ineffective. This paper proposes new algorithm called EOEH (Ensemble Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data). First, random secondary subsampling performed data, detectors run various small-scale sub-samples provide diverse results. Results then aggregated reduce global variance enhance robustness algorithm. Subsequently, information entropy utilized construct dimension-space weighting method that can discern influential factors within different dimensional generates weighted subspaces objects, reducing impact created improving performance. Finally, this offers design high-precision local factor (HPLOF) detector amplifies differentiation between normal thereby performance feasibility validated through experiments used both simulated UCI datasets. comparison popular algorithms, our demonstrates superior runtime efficiency. Compared with current popular, common improves 6% average. terms running time 20% faster than algorithms.
منابع مشابه
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data
We propose an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space. In particular, for each object in the data set, we explore the axis-parallel subspace spanned by its neighbors and determine how much the object deviates from the neighbors in this subspace. In our experiments, we show that our novel subspace outlier detection is super...
متن کاملOutlier detection for high dimensional data pdf
Is particularly useful for high dimensional data where outliers cannot be found.High dimensional data in Euclidean space pose special challenges to data. In about just the last few years, the task of unsupervised outlier detection has found.Outlier detection is an outstanding data mining task referred to open pdf with mac word class="text" href="https://tokiqivy.files.wordpress.com/2015/06/opel...
متن کاملOutlier detection for high-dimensional data
Outlier detection is an integral component of statistical modelling and estimation. For highdimensional data, classical methods based on the Mahalanobis distance are usually not applicable. We propose an outlier detection procedure that replaces the classical minimum covariance determinant estimator with a high-breakdown minimum diagonal product estimator. The cut-off value is obtained from the...
متن کاملDisk-Based Sampling for Outlier Detection in High Dimensional Data
We propose an efficient sampling based outlier detection method for large high-dimensional data. Our method consists of two phases. In the first phase, we combine a “sampling” strategy with a simple randomized partitioning technique to generate a candidate set of outliers. This phase requires one full data scan and the running time has linear complexity with respect to the size and dimensionali...
متن کاملFast target detection method for high-resolution SAR images based on variance weighted information entropy
Since the traditional CFAR algorithm is not suitable for high-resolution target detection of synthetic aperture radar (SAR) images, a new two-stage target detection method based on variance weighted information entropy is proposed in this paper. On the first stage, the regions of interest (ROIs) in SAR image is extracted based on the variance weighted information entropy (WIE), which has been p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Entropy
سال: 2023
ISSN: ['1099-4300']
DOI: https://doi.org/10.3390/e25081185